SORT: Simple Online and Realtime Tracking

AND

DEEP SORT: With a Deep Association Metric

Authors(SORT): Alex Bewley, Zongyuan Ge, Lionel Ott, Fabio Ramos, and Ben Upcroft.

Authors(DEEP SORT): Nicolai Wojke, Alex Bewley, and Dietrich Paulus

Presentation by: George Hendrick

Blog post by: Obiora Odugu

Link to Paper: Read the Paper

Summary of the Paper

SORT (Simple Online and Realtime Tracking) introduces a fast, lightweight object tracking algorithm that works in real-time by using a tracking-by-detection paradigm. It combines object detection (e.g., using Faster R-CNN), motion prediction via the Kalman filter, and data association using the Hungarian algorithm. While SORT achieves impressive speed and accuracy, it struggles with identity switches, especially during occlusions or when objects are close together. Deep SORT extends SORT by incorporating appearance information through deep learning. It uses a convolutional neural network (CNN) trained on a re-identification dataset to extract appearance features, which are then combined with motion data for better data association. This significantly reduces identity switches and improves tracking through occlusions, although it introduces computational overhead and requires a modern GPU for real-time performance.

Presentation Breakdown

Introduction

Introduction

This slide introduces the SORT tracker, a fast and lightweight system that follows the "tracking-by-detection" paradigm. SORT leverages a Kalman Filter for motion prediction and the Hungarian Algorithm for data association. A key insight is that detection quality significantly impacts tracking accuracy. SORT achieves a tracking speed of 260Hz—20x faster than previous state-of-the-art trackers—although this does not include the time taken for object detection.

Introduction

Introduction

This slide introduces Deep SORT, which builds upon SORT by integrating deep learning for appearance-based tracking. A deep association metric, trained on a person re-identification dataset, enhances the system’s ability to handle identity switches and occlusions. This innovation bridges the gap between classical MOT systems and modern deep learning-based approaches by combining speed and robust identity tracking.

Motivation

Motivation

This slide presents the motivation behind SORT and Deep SORT. Prior to these works, MOT methods were either batch-based or traditional online trackers. Batch-based approaches relied on future frame data, making them impractical for real-time use. Meanwhile, online methods required heavy computation and struggled with the trade-off between speed and tracking accuracy. The motivation for these papers was to address these limitations by designing a simple yet effective tracking system suitable for real-time applications.

Literature Background

Literature Background

The background section explores early data association methods in MOT. Techniques such as Global Data Association, Greedy Data Association, and K-Shortest Paths Tracking were all graph-based approaches to matching detections across frames. These methods laid the groundwork for how detection and tracking were handled prior to the rise of deep learning. Models like Faster R-CNN, YOLO, and SSD enabled real-time object detection, which in turn made simple tracking algorithms like SORT more viable. These breakthroughs provided the accuracy and speed needed to allow tracking-by-detection methods to thrive.

Contributions of the Paper

Contributions of the Paper

This slide outlines the main contributions of both papers. SORT demonstrated that high-speed tracking was possible without complex models, setting a new benchmark in the MOT field. Deep SORT addressed SORT’s key limitation—identity switches—by incorporating deep feature embeddings. Together, these works influenced MOT research and became widely adopted in autonomous driving, robotics, and surveillance.

Method

Method

The Method employed in sort and deep sort includes 1: Object Detection using Faster R-CNN. 2: Motion Prediction using Kalman Filter. 3: Data Association with Hungarian Algorithm. 4: Track Management to remove lost tracks. Deep SORT Tracking Pipeline Follows the same steps as SORT but adds Appearance Feature Extraction. Uses a CNN-based deep feature embedding to compare object appearances. Integrates Mahalanobis distance and cosine similarity into data association

Experimental Evaluation

Experimental Evaluation

The slide presents performance metrics. SORT achieved a high MOTA score, low false negatives, and the fewest lost targets among online trackers. These results confirmed that even simple tracking approaches can achieve state-of-the-art performance when paired with strong detectors.

Main insight

Main Insight

This slide reinforces the idea that good tracking depends heavily on detection quality. SORT showed that classical methods could still excel when supported by modern detectors. The paper also positioned SORT as a strong baseline for future MOT improvements.

Main Insight

Main Insight

Deep sort paper extends sort capabilities in areas such as surveillance systems where occulusion can create confusion for the sort algortihm. Other potential applications are in autonomous driving and robotics.

Future work

Future Work

This section discusses extensions of SORT and Deep SORT, including OC-SORT, Hybrid SORT, AM-SORT, and Deep HM-SORT. Each builds upon the original methods by adding cues like velocity, using transformer models, or tailoring algorithms to specific domains such as sports and adverse weather conditions.

Thoughts

Thoughts

The presenter reflects on SORT speed and simplicity, while recognizing Deep SORT for solving critical tracking issues. The presenter emphasizes that the best approach depends on the application—SORT for speed-critical tasks, Deep SORT for identity-sensitive scenarios.

Discussion and Class Insights

Q1: Do you believe Deep SORT should be considered an upgrade of SORT and should replace it in most scenarios? Is the speed tradeoff worth the greater tracking capability during periods of occlusion?

Bassel and Aleksandar: Bassel and Aleksandar discussed how in AV occlusion might not even be an issue and the need for solving the identity switching can be resolved compared to the application where deep sort and sort were tried on.

Sujan: Sujan talks about how deep sort can fit into privacy areas and would better fit such application while sort can be used in Avs further buttressing Bassel and Aleksandar comment.

Audience Questions and Answers

Professor: Professor Xugui asked if the papers are from the same group.

George: The first paper (SORT) introduced a simpler object tracking method using Kalman filtering and the Hungarian algorithm. The second paper (Deep SORT) extended it with appearance-based tracking. The first author is the same person

Aleksandar Avdalovic: is the function on page 18 focuses on minimization or maximization and what λ (lambda) represent.

George: George responded by referring him to the paper for details. The professor added that λ depends on the loss function and is often a parameter that is tuned experimentally.

Sujan Gyawali: How is Deep SORT used in real-world scenarios and how it is different from YOLO.

George: George explained that it is primarily used for tracking humans in videos, but can be applied to general object tracking particularly in AV. He went on to clarify that YOLO is for object detection, while Deep SORT integrates object detection with motion prediction and tracking. Professor added that the general process for AV involves: sensor data → deep learning model for object detection → motion tracking system for object tracking → planning system → low-level control (braking, gas, etc.). He concludes that better tracking systems lead to better vehicle planning.

Ruslan: Ruslan mentioned that SORT is good for real-time tracking, while Deep SORT is better for accuracy.

George: George clarified that Deep SORT can also be used in real-time, but it is slower than sort.